智能论文笔记

Scalable Primitives for Generalized Sensor Fusion in Autonomous Vehicles

Sammy Sidhu , Linda Wang , Tayyab Naseer , Ashish Malhotra , Jay Chia , Aayush Ahuja , Ella Rasmussen , Qiangui Huang , Ray Gao

分类：计算机视觉 | 机器人

2021-12-01

在自主驾驶中，在使用深神经网络的爆炸中爆炸用于感知，预测和规划任务。由于自主车辆（AVS）更接近生产，多模态传感器输入和具有不同传感器平台的异构车队在该行业中变得越来越普遍。然而，神经网络架构通常是针对特定的传感器平台，并且对输入的变化并不稳健，使得缩放和模型部署的问题特别困难。此外，大多数玩家仍然将软件和硬件的问题视为完全独立的问题。我们提出了一个新的终端架构，广义传感器融合（GSF），其设计成使得传感器输入和目标任务都是模块化和可修改的。这使AV系统设计人员能够轻松地使用不同的传感器配置和方法进行实验，并使用在大型工程组织中共享的相同型号开辟了在异构船队上部署的能力。使用该系统，我们报告了实验结果，我们展示了昂贵的高密度（HD）激光雷达传感器的近似奇偶阶段，具有3D对象检测任务中的廉价低密度（LD）LIDAR加相机设置。这为行业铺平了道路，共同设计硬件和软件架构以及具有异质配置的大船队。

translated by 谷歌翻译

Floods Relevancy and Identification of Location from Twitter Posts using NLP Techniques

Muhammad Suleman , Muhammad Asif , Tayyab Zamir , Ayaz Mehmood , Jebran Khan , Nasir Ahmad , Kashif Ahmad

分类：自然语言处理

2023-01-01

This paper presents our solutions for the MediaEval 2022 task on DisasterMM. The task is composed of two subtasks, namely (i) Relevance Classification of Twitter Posts (RCTP), and (ii) Location Extraction from Twitter Texts (LETT). The RCTP subtask aims at differentiating flood-related and non-relevant social posts while LETT is a Named Entity Recognition (NER) task and aims at the extraction of location information from the text. For RCTP, we proposed four different solutions based on BERT, RoBERTa, Distil BERT, and ALBERT obtaining an F1-score of 0.7934, 0.7970, 0.7613, and 0.7924, respectively. For LETT, we used three models namely BERT, RoBERTa, and Distil BERTA obtaining an F1-score of 0.6256, 0.6744, and 0.6723, respectively.

translated by 谷歌翻译

Guidance Through Surrogate: Towards a Generic Diagnostic Attack

Muzammal Naseer , Salman Khan , Fatih Porikli , Fahad Shahbaz Khan

分类：机器学习 | 人工智能 | 计算机视觉

2022-12-30

Adversarial training is an effective approach to make deep neural networks robust against adversarial attacks. Recently, different adversarial training defenses are proposed that not only maintain a high clean accuracy but also show significant robustness against popular and well studied adversarial attacks such as PGD. High adversarial robustness can also arise if an attack fails to find adversarial gradient directions, a phenomenon known as `gradient masking'. In this work, we analyse the effect of label smoothing on adversarial training as one of the potential causes of gradient masking. We then develop a guided mechanism to avoid local minima during attack optimization, leading to a novel attack dubbed Guided Projected Gradient Attack (G-PGA). Our attack approach is based on a `match and deceive' loss that finds optimal adversarial directions through guidance from a surrogate model. Our modified attack does not require random restarts, large number of attack iterations or search for an optimal step-size. Furthermore, our proposed G-PGA is generic, thus it can be combined with an ensemble attack strategy as we demonstrate for the case of Auto-Attack, leading to efficiency and convergence speed improvements. More than an effective attack, G-PGA can be used as a diagnostic tool to reveal elusive robustness due to gradient masking in adversarial defenses.

translated by 谷歌翻译

PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery

Sheng Zhang , Salman Khan , Zhiqiang Shen , Muzammal Naseer , Guangyi Chen , Fahad Khan

分类：计算机视觉

2022-12-11

Although existing semi-supervised learning models achieve remarkable success in learning with unannotated in-distribution data, they mostly fail to learn on unlabeled data sampled from novel semantic classes due to their closed-set assumption. In this work, we target a pragmatic but under-explored Generalized Novel Category Discovery (GNCD) setting. The GNCD setting aims to categorize unlabeled training data coming from known and novel classes by leveraging the information of partially labeled known classes. We propose a two-stage Contrastive Affinity Learning method with auxiliary visual Prompts, dubbed PromptCAL, to address this challenging problem. Our approach discovers reliable pairwise sample affinities to learn better semantic clustering of both known and novel classes for the class token and visual prompts. First, we propose a discriminative prompt regularization loss to reinforce semantic discriminativeness of prompt-adapted pre-trained vision transformer for refined affinity relationships. Besides, we propose a contrastive affinity learning stage to calibrate semantic representations based on our iterative semi-supervised affinity graph generation method for semantically-enhanced prompt supervision. Extensive experimental evaluation demonstrates that our PromptCAL method is more effective in discovering novel classes even with limited annotations and surpasses the current state-of-the-art on generic and fine-grained benchmarks (with nearly $11\%$ gain on CUB-200, and $9\%$ on ImageNet-100) on overall accuracy.

translated by 谷歌翻译

Self-Distilled Vision Transformer for Domain Generalization

Maryam Sultana , Muzammal Naseer , Muhammad Haris Khan , Salman Khan , Fahad Shahbaz Khan

分类：计算机视觉 | 人工智能 | 机器学习

2022-07-25

最近，已经提出了几种领域的概括（DG）方法，表现出令人鼓舞的性能，但是，几乎所有的都基于卷积神经网络（CNN）。研究视觉变压器（VIT）的DG性能（VIT）几乎没有进展，这挑战了CNN在标准基准测试基准上的至高无上，通常是基于I.I.D假设。这使VITS的现实部署令人怀疑。在本文中，我们试图探索解决DG问题的VIT。与CNN类似，VIT在分发场景中也挣扎，主要的罪魁祸首过于适合来源域。受VIT的模块化体系结构的启发，我们提出了一种简单的DG方法，用于VIT，以VIT的自我验证。它通过策划中间变压器块的非零熵监管信号来减少输入输出映射问题的学习来减少源域的过度拟合。此外，它不会引入任何新参数，并且可以无缝地插入不同VIT的模块化组成中。我们在五个具有挑战性的数据集中以不同的DG基准和各种VIT骨架表现出显着的性能提高。此外，我们报告了针对最近最新的DG方法的有利性能。我们的代码以及预培训的模型可在以下网址公开获取：https：//github.com/maryam089/sdvit

translated by 谷歌翻译

Adversarial Pixel Restoration as a Pretext Task for Transferable Perturbations

Hashmat Shadab Malik , Shahina K Kunhimon , Muzammal Naseer , Salman Khan , Fahad Shahbaz Khan

分类：计算机视觉

2022-07-18

可转移的对抗性攻击优化了从验证的替代模型和已知标签空间中的对手，以欺骗未知的黑盒模型。因此，这些攻击受到有效的替代模型的可用性受到限制。在这项工作中，我们放宽了这一假设，并提出了对抗像素的恢复，作为一种自制的替代方案，可以在无标签和很少的数据样本的条件下从头开始训练有效的替代模型。我们的培训方法是基于一个最小目标的目标，该目标通过对抗目标减少过度拟合，从而为更概括的替代模型进行了优化。我们提出的攻击是对对抗性像素恢复的补充，并且独立于任何特定任务目标，因为它可以以自我监督的方式启动。我们成功地证明了我们对视觉变压器方法的对抗性可传递性以及卷积神经网络，用于分类，对象检测和视频分割的任务。我们的代码和预培训的代理模型可在以下网址找到：https：//github.com/hashmatshadab/apr

translated by 谷歌翻译

ExAID: A Multimodal Explanation Framework for Computer-Aided Diagnosis of Skin Lesions

Adriano Lucieri , Muhammad Naseer Bajwa , Stephan Alexander Braun , Muhammad Imran Malik , Andreas Dengel , Sheraz Ahmed

分类：人工智能 | 机器学习

2022-01-04

在临床工作流程中成功部署AI的计算机辅助诊断（CAD）系统的一个主要障碍是它们缺乏透明决策。虽然常用可解释的AI方法提供了一些对不透明算法的洞察力，但除了高度训练的专家外，这种解释通常是复杂的，而不是易于理解的。关于皮肤病图像的皮肤病病变恶性的决定的解释需要特别清晰，因为潜在的医疗问题定义本身是模棱两可的。这项工作提出了exaid（可解释的ai用于皮肤科），是生物医学图像分析的新框架，提供了由易于理解的文本解释组成的多模态概念的解释，该概念由可视地图证明预测的视觉映射。 Exap依赖于概念激活向量，将人类概念映射到潜在空间中的任意深度学习模型学习的人，以及概念本地化地图，以突出输入空间中的概念。然后，这种相关概念的识别将用于构建由概念 - 明智地点信息补充的细粒度文本解释，以提供全面和相干的多模态解释。所有信息都在诊断界面中全面呈现，用于临床常规。教育模式为数据和模型探索提供数据集级别解释统计和工具，以帮助医学研究和教育。通过严谨的exaid定量和定性评估，即使在错误的预测情况下，我们展示了CAD辅助情景的多模态解释的效用。我们认为突然将为皮肤科医生提供一种有效的筛查工具，他们都理解和信任。此外，它将是其他生物医学成像领域的类似应用的基础。

translated by 谷歌翻译

Pre-Training Transformers for Domain Adaptation

Burhan Ul Tayyab , Nicholas Chua

分类：计算机视觉

2021-12-18

Visual域适应挑战2021称为无监督域适配方法，可以通过将从源数据集的知识传输到分发外目标数据集来改善模型的性能。在本文中，我们利用Beit [1]并展示其从源数据集中捕获密钥属性的能力，并以半监督方式将其应用于目标数据集。我们的方法能够优于最新的最先进（SOTA）技术，并且能够在Visda领域适应挑战中实现第1位，ACC为56.29％，Auroc为69.79％。

translated by 谷歌翻译

Self-supervised Video Transformer

Kanchana Ranasinghe , Muzammal Naseer , Salman Khan , Fahad Shahbaz Khan , Michael Ryoo

分类：计算机视觉

2021-12-02

在本文中，我们向使用未标记的视频数据提出了用于视频变压器的自我监督培训。从给定的视频，我们创建了不同的空间尺寸和帧速率的本地和全球时空视图。我们的自我监督目标旨在匹配这些不同视图的特征，代表相同的视频，以不变于动作的时空变化。据我们所知，所提出的方法是第一个缓解对自我监督视频变压器（SVT）中的负样本或专用内存库的依赖。此外，由于变压器模型的灵活性，SVT使用动态调整的位置编码在单个架构内支持慢速视频处理，并支持沿着时空尺寸的长期关系建模。我们的方法在四个动作识别基准（动力学-400，UCF-101，HMDB-51和SSV2）上执行良好，并通过小批量尺寸更快地收敛。代码：https://git.io/j1juj.

translated by 谷歌翻译

On Improving Adversarial Transferability of Vision Transformers

Muzammal Naseer , Kanchana Ranasinghe , Salman Khan , Fahad Shahbaz Khan , Fatih Porikli

分类：计算机视觉 | 人工智能 | 机器学习

2021-06-08

视觉变形金刚（VITS）处理将图像输入图像作为通过自我关注的斑块;比卷积神经网络（CNNS）彻底不同的结构。这使得研究Vit模型的对抗特征空间及其可转移性有趣。特别是，我们观察到通过常规逆势攻击发现的对抗性模式，即使对于大型Vit模型，也表现出非常低的黑箱可转移性。但是，我们表明这种现象仅是由于不利用VITS的真实表示潜力的次优攻击程序。深紫色由多个块组成，具有一致的架构，包括自我关注和前馈层，其中每个块能够独立地产生类令牌。仅使用最后一类令牌（传统方法）制定攻击并不直接利用存储在早期令牌中的辨别信息，从而导致VITS的逆势转移性差。使用Vit模型的组成性质，我们通过引入特定于Vit模型结构的两种新策略来增强现有攻击的可转移性。（i）自我合奏：我们提出了一种通过将单vit模型解剖到网络的集合来找到多种判别途径的方法。这允许在每个VIT块处明确地利用特定于类信息。（ii）令牌改进：我们建议改进令牌，以进一步增强每种Vit障碍的歧视能力。我们的令牌细化系统地将类令牌系统组合在补丁令牌中保留的结构信息。在一个视觉变压器中发现的分类器的集合中应用于此类精炼令牌时，对抗攻击具有明显更高的可转移性。

translated by 谷歌翻译